Anonymization of electronic medical records for validating genome-wide association studies.

نویسندگان

  • Grigorios Loukides
  • Aris Gkoulalas-Divanis
  • Bradley Malin
چکیده

Genome-wide association studies (GWAS) facilitate the discovery of genotype-phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients' standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data "as is" may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients' identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genetics Identification of Genomic Predictors of Atrioventricular Conduction Using Electronic Medical Records as a Tool for Genome Science

Background—Recent genome-wide association studies in which selected community populations are used have identified genomic signals in SCN10A influencing PR duration. The extent to which this can be demonstrated in cohorts derived from electronic medical records is unknown. Methods and Results—We performed a genome-wide association study on 2334 European American patients with normal ECGs withou...

متن کامل

Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science.

BACKGROUND Recent genome-wide association studies in which selected community populations are used have identified genomic signals in SCN10A influencing PR duration. The extent to which this can be demonstrated in cohorts derived from electronic medical records is unknown. METHODS AND RESULTS We performed a genome-wide association study on 2334 European American patients with normal ECGs with...

متن کامل

Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk.

BACKGROUND ECG QRS duration, a measure of cardiac intraventricular conduction, varies ≈2-fold in individuals without cardiac disease. Slow conduction may promote re-entrant arrhythmias. METHODS AND RESULTS We performed a genome-wide association study to identify genomic markers of QRS duration in 5272 individuals without cardiac disease selected from electronic medical record algorithms at 5 ...

متن کامل

Efficient genome-wide association in biobanks using topic modeling identifies multiple novel disease loci.

Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that may be unreliable and fail to capture the relationship between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations of diagnostic codes from electronic health records (EHR) for 10845 participants in a biobanking progra...

متن کامل

Genome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review

Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 107 17  شماره 

صفحات  -

تاریخ انتشار 2010